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Abstract 

Motivated by change point problems in time series and the detection of textured objects in 
images, we consider the problem of detecting a piece of a Gaussian Markov random field hidden 
in white Gaussian noise. We derive minimax lower bounds and propose near-optimal tests. 


1 Introduction 

Anomaly detection is important in a number of applications, including surveillance and environment 
monitoring systems using sensor networks, object tracking from video or satellite images, and tumor 
detection in medical imaging. The most common model is that of an object or signal of unusually 
high amplitude hidden in noise. In other words, one is interested in detecting the presence of an 
object in which the mean of the signal is different from that of the background. We refer to this as 
the detection-of-means problem. In many situations, anomaly manifests as unusual dependencies 
in the data. This detection-of-correlations problem is the one that we consider in this paper. 

1.1 Setting and hypothesis testing problem 

It is common to model dependencies by a Gaussian random held X = (A* : i G V), where V C Voo 
is of size |V| = n, while Voo is countably inhnite. We focus on the important example of a d- 
dimensional integer lattice 

V = {l,...,mFc Voo =Z‘'. (1) 

We formalize the task of detection as the following hypothesis testing problem. One observes 
a realization of A = (A* : i £ V), where the Aj’s are known to be standard normal. Under the 
null hypothesis T-Lq, the Aj’s are independent. Under the alternative hypothesis Tdi, the Aj’s are 
correlated in one of the following ways. Let C be a class of subsets of V. Each set 5 G C represents 
a possible anomalous subset of the components of A. Specihcally, when 5 G C is the anomalous 
subset of nodes, each Aj with i ^ S is still independent of all the other variables, while (A* : i G S) 
coincides with (Yi : i £ S), where Y = (Yi : i £ Voo) is a stationary Gaussian Markov random field. 
We emphasize that, in this formulation, the anomalous subset S is only known to belong to C. 

We are thus addressing the problem of detecting a region of a Gaussian Markov random field 
against a background of white noise. This testing problem models important detection problems 
such as the detection of a piece of a time series in a signal and the detection of a textured object 
in an image, which we describe below. Before doing that, we further detail the model and set some 
foundational notation and terminology. 
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1.2 Tests and minimax risk 


We denote the distribution of X under 'Hq by Pq- The distribution of the zero-mean stationary 
Gaussian Markov random field Y is determined by its covariance operator F = (Fjj : i,j G Voo) 
defined by Tij = We denote the distribution of X under Tii by when S' G C is the 

anomalous set and F is the covariance operator of the Gaussian Markov random field Y. 

A test is a measurable function / : —)• {0,1}. When f{X) = 0, the test accepts the null 

hypothesis and it rejects it otherwise. The probability of type I error of a test / is Po{/(W) = 1}. 
When S G C is the anomalous set and Y has covariance operator F, the probability of type II error 
is P 5 ,r{/(W) = 0}. In this paper we evaluate tests based on their worst-case risks. The risk of a 
test / corresponding to a covariance operator F and class of sets C is defined as 

i?c.r(/) = Po{/(^) = 1} + max P5,r{/(X) = 0} . (2) 

Defining the risk this way is meaningful when the distribution of Y is known, meaning that F is 
available to the statistician. In this case, the minimax risk is defined as 

Rlr = ^fRc,r{f) , (3) 

where the infimum is over all tests /. When F is only known to belong to some class of covariance 
operators 0 , it is more meaningful to define the risk of a test / as 

RcMf) = MfiX) = 1} + maxmax Fs,r{f{X) = 0} . (4) 

i G© oEC 

The corresponding minimax risk is defined as 

Rl^ = inf Rc,M . (5) 

In this paper we consider situations in which the covariance operator F is known (i.e., the test / 
is allowed to be constructed using this information) and other situations when T is unknown but 
it is assumed to belong to a class 0. When F is known (resp. unknown), we say that a test / 
asymptotically separates the two hypotheses if Rc,r{f) 0 (resp. i?c,©(/) 0); and we say that 

the hypotheses merge asymptotically if p —)> 1 (resp. g —>■ 1), as n = | V| —)• oo. We note that, 

as long as F G 0, p < i?p 0 , and that Rq ^ < 1, since the test f = 1 (which always rejects) has 

risk equal to 1 . 

At a high-level, our results are as follows. We characterize the minimax testing risk for both 
known (.Rp p) and unknown (Rp^) covariances when the anomaly is a Gaussian Markov random 
field. More precisely, we give conditions on F or 0 enforcing the hypotheses to merge asymptotically 
so that detection problem is nearly impossible. Under nearly matching conditions, we exhibit tests 
that asymptotically separate the hypotheses. Our general results are illustrated in the following 
subsections. 

1.3 Example: detecting a piece of time series 

As a first example of the general problem described above, consider the case of observing a time 
series Xi ,..., Xn- This corresponds to the setting of the lattice (1) in dimension d = 1. Under the 
null hypothesis, the W’s are i.i.d. standard normal random variables. We assume that the anomaly 
comes in the form of temporal correlations over an (unknown) interval S = {i 1,... ,i k} of, 
say, known length k < n. Here, i G {0, 1... ,n — k} is thus unknown. Specifically, when S is the 
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anomalous interval, (Xj+i,..., Xj+fc) ~ (Fj+i,..., Xj+fc), where {Yi : i £ Z) is an autoregressive 
process of order h (abbreviated AR/^) with zero mean and unit variance, that is, 

Yi = TpiYi—i + • • • + iphYi—h + crZi, \/i £ Z, (6) 

where {Zi : i G Z) are i.i.d. standard normal random variables, 'i/^i,..., V’/i £ R are the coefficients 
of the process—assumed to be stationary—and fi > 0 is such that Var(yj) = 1 for all i. Note that a 
is a function of V’l, • ■ •, V'h, so that the model has effectively h parameters. It is well-known that the 
parameters 'ipi,... ,il)h define a stationary process when the roots of the polynomial — 
in the complex plane lie within the open unit circle. See Brockwell and Davis (1991) for a standard 
reference on time series. 

In the simplest setting h = \ and the parameter space for V’ is (—1,1). Then, the hypothesis 
testing problem is to distinguish 

Ho :Xi,...,X„ ~AA(0,1), 


versus 

Hi : G {0,1,..., n — /c} such that 

Xi,...,Xi,Xi+fc+i,...,X„~AA(0,l) 
and (Xi+i,..., Xi+fc) is independent of Xi,..., Xj, Xj+^+i, • • •, X„ with 

Xi+,+1 - AA(0,1 - Vj G {1,..., - 1} . 

Typical realizations of the observed vector under the null and alternative hypotheses are illustrated 
in Figure 1. 



Figure 1: Top: a realization of the observed time series under the null hypothesis (white noise). 
Bottom: a realization under the alternative with anomalous interval S = {201,..., 250}, assuming 
an ARi covariance model with parameter ^|J = 0.9. 

Gaussian autoregressive processes and other correlation models are special cases of Gaussian 
Markov random fields, and therefore this setting is a special case of our general framework, with C 
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being the class of discrete intervals of length k. In the simplest case, the length of the anomalous 
interval is known beforehand. In more complex settings, it is unknown, in which case C may be 
taken to be the class of all intervals within V of length at least femin- 

This testing problem has been extensively studied in the slightly different context of change- 
point analysis, where under the null hypothesis Xi, ..., are generated from an process 

for some '0*^ G M^, while under the alternative hypothesis there is an i G V such that Xi,... ,Xi 
and Xj+i,...,X„ are generated from AR/i(^/)°) and AR/i('!/)^), with 'ij/^ / , respectively. The 

order h is often given. In fact, instead of assuming autoregressive models, nonparametric models 
are often favored. See, for example, Davis et al. (1995); Giraitis and Leipus (1992); Horvath 
(1993); Huskova et al. (2007); Lavielle and Ludeha (2000); Paparoditis (2009); Picard (1985); 
Priestley and Subba Rao (1969) and many other references therein. These papers often suggest 
maximum likelihood tests whose limiting distributions are studied under the null and (sometimes 
fixed) alternative hypotheses. For example, in the special case of /i = 1, such a test would reject 
when is large, where ijj is the maximum likelihood estimate for V’- In particular, from Picard 
(1985), we can speculate that such a test can asymptotically separate the hypotheses in the simplest 
setting described above when —)■ oo for some a < 1/2 fixed. See also Huskova et al. (2007); 

Paparoditis (2009) for power analyses against fixed alternatives. 

Our general results imply the following in the special case when the anomaly comes in the form 
of an autoregressive process with unknown parameter ijj G We note that the order of the 
autoregressive model h is allowed to grow with n in this asymptotic result. 

Corollary 1. Assume n,k ^ oo, and that h = o(y^A:/ log(n) A Denote by ^{h,r) the class 

of covariance operators corresponding to AR^ processes with valid parameter = (fji,... 
satisfying HV’lli ^ Then Rc:g(^hr) ^ when 

< Ci{log{n/k)/k + h\og{n/k)/k) . (7) 

Conversely, if f denotes the pseudo-likelihood test of Section f.2, then Rc,:s{h,r){f) 0 when 

r"^ > C 2 {log{n)/k-\-y/h\og{n)/k) . (8) 

In both cases, Ci and C 2 denote numerical constants. 

Remark 1. In the interesting setting where k = n'^ for some k > 0 fixed, the lower and upper 
bounds provided by Corollary 1 match up to a multiplicative constant that depends only on k. 

Despite an extensive literature on the topic, we are not aware of any other minimax optimality 
result for time series detection. 

1.4 Example: detecting a textured region 

In image processing, the detection of textured objects against a textured background is relevant 
in a number of applications, such as in the detection of local fabric defects in the textile industry 
by automated visual inspection (Kumar, 2008), the detection of a moving object in a textured 
background (Kim et ah, 2005; Yilmaz et al., 2006), the identification of tumors in medical imag¬ 
ing (James et al., 2001; Karkanis et ah, 2003), the detection of man-made objects in natural 
scenery (Kumar and Hebert, 2003), the detection of sites of interest in archeology (Litton and 
Buck, 1995) and of weeds in crops (Dryden et al., 2003). In all these applications, the object is 
generally small compared to the size of the image. 

Common models for texture include Markov random fields (Cross and Jain, 1983) and joint 
distributions over filter banks such as wavelet pyramids (Manjunath and Ma, 1996; Portilla and 
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Simoncelli, 2000). We focus here on textures that are generated via Gaussian Markov random fields 
(Chellappa and Chatterjee, 1985; Zhu et ah, 1998). Our goal is to detect a textured object hidden 
in white noise. For this discussion, we place ourselves in the lattice setting (1) in dimension d = 2. 
Just like before, under T-Lo, the (Xi : i € V) are independent standard normal random variables. 
Under Hi, when the region S C V is anomalous, the {Xi : i ^ S) are still i.i.d. standard normal, 
while {Xi : i G S) ^ {Yi : i G S), where {Yi : i G 7?) is such that for each i G 7?, the conditional 
distribution of Yi given the rest of the variables := {Yj ■ j ^ i) is normal with mean 

^ y ( 9 ) 

(ii,t2)e[-h,/x]2\{(o,o)} 

and variance cr^, where the the coefficients of the process and is such that Var(yj) = 1 

for all i. The set of valid parameters (j) is defined in Section 2.1. A simple sufficient condition is 
ll<?^lli = E{ti,t2)e[-h,h]2\{(o,o)} \(l)ti,t 2 \ < 1- III this model, the dependency neighborhood oi i G 7^ 
is i + [— h,/i]^ n Z^. One of the simplest cases is when h = 1 and (t)ti,t 2 = when (ti,t 2 ) £ 
{(±1,0), (0, ±1)} for some (j) G (—1/4,1/4), and the anomalous region is a discrete square; see 
Figure 2 for a realization of the resulting process. 

This is a special case of our setting. While intervals are natural in the case of time series, squares 
are rather restrictive models of anomalous regions in images. We consider instead the “blob-like” 
regions (to be defined later) that include convex and star-shaped regions. 



Figure 2: Left: white noise, no anomalous region is present. Right: a squared anomalous region 
is present. In this example on the 50 x 50 grid, the anomalous region is a 15 x 15 square piece 
from a Gaussian Markov random field with neighborhood radius h = 1 and coefficient vector 
4>ti,t2 = </>:= j(l — 10“^) when {ti,t 2 ) G {(±1,0), (0, ±1)}, and zero otherwise. 

A number of publications address the related problems of texture classification (Kervrann and 
Heitz, 1995; Varma and Zisserman, 2005; Zhu et ah, 1998) and texture segmentation (Galun et ah, 
2003; Grigorescu et ah, 2002; Hofmann et ah, 1998; Jain and Farrokhnia, 1991; Malik et ah, 2001). 
In fact, this literature is quite extensive. Only very few papers address the corresponding change- 
point problem (Palenichka et ah, 2000; Shahrokni et ah, 2004) and we do not know of any theoretical 
results in this literature. Our general results (in particular, Gorollary 4) imply the following. 

Corollary 2. Assume n,k ^ oo, and that h = o{i/k/log{n) A Denote by <3{h,r) the class 

of covariance operators corresponding to stationary Gaussian Markov Random Fields with valid 
parameter (see Section 2.1 for more details) 4> = ((/>ij)(jj)g{_/j^ satisfying \\(j)\\\ > r^. Then 
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1 when 


R. 


'C,0{h,r) 


R < Cl 


log(n/A:) ^ ^yh‘^ log{n/k) 


k k 

Conversely, if f denotes the pseudo-likelihood test of Section J^.2, then Rc, 0 (h,r){f ) 0 when 


( 10 ) 


>C2 


log(n/fc) ^ \og{n/k) 


k 


k 


( 11 ) 


In both cases, Ci and C 2 denote positive numerical constants. 

Informally, the lower bound on the magnitude of the coefficient vector namely quantifies 
the extent to which the variables Yi are explained by the rest of variables as in (9). 

Although not in the literature on change-point or object detection, Anandkumar et al. (2009) 
is the only other paper developing theory in a similar context. It considers a spatial model where 
points {xi,i G [N]} are sampled uniformly at random in some bounded region and a nearest- 
neighbor graph is formed. On the resulting graph, variables are observed at the nodes. Under the 
(simple) null hypothesis, the variables are i.i.d. zero mean normal. Under the (simple) alternative, 
the variables arise from a Gaussian Markov random with covariance operator of the form Tij oc 
g{\\xi — Xjll), where 5 is a known function. The paper analyzes the large-sample behavior of the 
likelihood ratio test. 


1.5 More related work 

As we mentioned earlier, the detection-of-means setting is much more prevalent in the literature. 
When the anomaly has no a priori structure, the problem is that of multiple testing; see, for 
example, Baraud (2002); Donoho and Jin (2004); Ingster (1999) for papers testing the global null 
hypothesis. Much closer to what interests us here, the problem of detecting objects with various 
geometries or combinatorial properties has been extensively analyzed, for example, in some of our 
earlier work (Addario-Berry et ah, 2010; Arias-Castro et al., 2011, 2008) and elsewhere (Desolneux 
et al., 2003; Walther, 2010). We only cite a few publications that focus on theory. The applied 
literature is vast; see Arias-Castro et al. ( 2011 ) for some pointers. 

Despite its importance in practice, as illustrated by the examples and references given in Sec¬ 
tions 1.3 and 1.4, the detection-of-correlations setting has received comparatively much less atten¬ 
tion, at least from theoreticians. Here we find some of our own work (Arias-Castro et ah, 2012 , 
2015). In the first of these papers, we consider a sequence Xi ,..., of standard normal random 
variables. Under the null, they are independent. Under the alternative, there is a set S' in a class of 
interest C where the variables are correlated. We consider the unstructured case where C is the class 
of all sets of size k (given) and also various structured cases, and in particular, that of intervals. 
This would appear to be the same as in the present lattice setting in dimension d = 1, but the 
important difference is that that correlation operator F is not constrained, and in particular no 
Markov random field structure is assumed. The second paper extends the setting to higher dimen¬ 
sions, thus testing whether some coordinates of a high-dimensional Gaussian vector are correlated 
or not. When the correlation structure in the anomaly is arbitrary, the setting overlaps with that 
of sparse principal component analysis (Berthet and Rigollet, 2013; Cai et al., 2013). The problem 
is also connected to covariance testing in high-dimensions; see, e.g., Cai and Ma (2013). We refer 
the reader to the above-mentioned papers for further references. 
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1.6 Contribution and content 

The present paper thus extends previous work on the detection-of-means setting to the detection- 
of-correlations setting in the (structured) context of detecting signals/objects in time series/images. 
The paper also extends some of our own work on the detection-of-correlations to Markov random 
field models, which are typically much more appropriate in the context of detection in signals 
and images. The theory in the detection-of-correlations setting is more complicated than in the the 
detection-of-means setting, and in particular deriving exact minimax (first-order) results remains an 
open problem. Compared to our previous work on the detection-of-correlations setting, the Marko¬ 
vian assumption makes the problem significantly more complex as it requires handling Markov 
random fields which are conceptually more complex objects. As a result, the proof technique is 
by-and-large novel, at least in the detection literature. 

The rest of the paper is organized as follows. In Section 2 we lay down some foundations on 
Gaussian Markov Random Fields, and in particular, their covariance operators, and we also derive a 
general minimax lower bound that is used several times in the paper. In the remainder of the paper, 
we consider detecting correlations in a finite-dimensional lattice (1), which includes the important 
special cases of time series and textures in images. We establish lower bounds, both when the 
covariance matrix is known (Section 3) or unknown (Section 4) and propose test procedures that 
are shown to achieve the lower bounds up to multiplicative constants. In Section 5, we specialize our 
general results to specific classes of anomalous regions such as classes of cubes, and more generally, 
“blobs.” In Section 6 we outline possible generalizations and further work. The proofs are gathered 
in Section 7. 


2 Preliminaries 

In this paper we derive upper and lower bounds for the minimax risk, both when F is known as in 
(3) and when it is unknown as in (5), the latter requiring a substantial amount of additional work. 
For the sake of exposition, we sketch here the general strategy for obtaining minimax lower bounds 
by adapting the general strategy initiated in Ingster (1993) to detection-of-correlation problems. 
This allows us to separate the technique used to derive minimax lower bounds from the technique 
required to handle Gaussian Markov random fields. 


2.1 Some background on Gaussian Markov random fields 


We elaborate on the setting described in Sections 1.1 and 1.2. As the process Y is indexed by Z'^, 
note that all the indices f of and F are d-dimensional. Given a positive integer h, denote by the 
integer lattice {—h ,..., with {2h + \)^ — 1 nodes. For any nonsingular covariance operator 

r of a stationary Gaussian Markov random field over Z'^ with unit variance and neighborhood N/i, 
there exists a unique vector (p indexed by the nodes of satisfying pi = p-i such that, for all 
hi 



-pi-j if 1 < |f - j|oo </i, 

< 1 if f = j, 

0 otherwise , 


( 12 ) 


where F~^ denotes the inverse of the covariance operator F. Gonsequently, there exists a bijective 
map from the collection of invertible covariance operators of stationary Gaussian Markov random 
fields over Z'^ with unit variance and neighborhood N/i to some subset ‘h/j C Given p G <1/^, 
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r((/>) denotes the unique covariance operator satisfying = 1 and (12). It is well known that 
contains the set of vectors (p whose t'l-norm is smaller than one, that is, 

e : WcPWi < 1} C , 

as the corresponding operator r~^(())) is diagonally dominant in that case. In fact, the parameter 
space <I>/i is characterized by the Fast Fourier Transform (FFT) as follows 

<^h = \^4>- 1+ (/>iCos((i,a;)) > 0, Vw E (-vr, 7r]'^| , 

l<|i|oo</i 

where and i ^ and (•, •) denotes the scalar product in The interested reader is referred to 
(Guyon, 1995, Sect.1.3) or (Rue and Held, 2005, Sect.2.6) for further details and discussions. For 
(p E $/!, define = l/r"/((/.). 

The correlated process Y = (Yi : i ^ Z'^) is centered Gaussian with covariance operator r(())) is 
such that, for each i E Z*^, the conditional distribution of Yi given the rest of the variables ig 

~ . (13) 

Define the h-boundary of S', denoted A/i(S), as the collection of vertices in S whose distance 
to Z'^ \ S is at most h. We also define the h-interior S as = S \ A/i(S). If S C V is a finite 
set, we denote by the principal submatrix of the covariance operator F indexed by S. If F is 
nonsingular, each such submatrix is invertible. 


2.2 A general minimax lower bound 

As is standard, an upper bound is obtained by exhibiting a test / and then upper-bounding its 
risk—either (2) or (4) according to whether F is known or unknown. In order to derive a lower 
bound for the minimax risk, we follow the standard argument of choosing a prior distribution on 
the class of alternatives and then lower-bounding the minimax risk with the resulting average risk. 
When F is known, this leads us to select a prior on C, denoted by and consider 

Ru,Tif) = Mf{X) = 1} + HS)¥s,T{fiX) = 0} and = ^fR.,r(/) • (14) 

Sec 


The latter is the Bayes risk associated with By placing a prior on the class of alternative 
distributions, the alternative hypothesis becomes effectively simple (as opposed to composite). The 
advantage of this is that the optimal test may be determined explicitly. Indeed, the Neyman- 
Pearson fundamental lemma implies that the likelihood ratio test /*p(x) = > 1}, with 


L,,r = ^^{S) 

See 


dIPs,] 

dPn 


minimizes the average risk. In most of the paper, u will be chosen as the uniform distribution on 
the class C. In this because the sets in C play almost the same role (although not exactly because 
of boundary effects). 

When F is only known to belong to some class © we also need to choose a prior on ©, which 
we denote by vr, leading to 

RuAf) = Mf{X) = 1} + AS) [ Ps,r{/(A) = 0}7r(dF) and = mf RuAf) • (15) 

See ^ 



In this case, the likelihood ratio test becomes f*^(x) = > 1}, where 


Lu,. = ’ ^5.. = [ Ps,rvr(dr) , 

sec ^ 


minimizes the average risk. 

In both cases, we then proceed to bound the second moment of the resulting likelihood ratio 
under the null. Indeed, in a general setting, if L is the likelihood ratio for Pq versus Pi and R 
denotes its risk, then (Lehmann and Romano, 2005, Problem 3.10) 

R = l-^Eo\L{X)-l\>l-^^Eo[L{X)^]-l , (16) 

where the inequality follows by the Cauchy-Schwarz inequality. 

Remark 2. Working with the minimax risk (as we do here) allows us to bypass making an explicit 
choice of prior, although one such choice is eventually made when deriving a lower bound. Another 
advantage is that the minimax risk is monotone with respect to the class C in the sense that if 
C G C, then the minimax risk corresponding to C' is at most as large as that corresponding to C. 
This monotonicity does not necessarily hold for the Bayes risk. See Addario-Berry et al. (2010) for 
a discussion in the context of the detection-of-means problem. 

We now state a general minimax lower bound. (Recall that all the proofs are in Section 7.) 
Although the result is stated for a class C of disjoint subsets, using the monotonicity of the minimax 
risk, the result can be used to derive lower bounds in more general settings. It is particularly 
useful in the context of detecting blob-like anomalous regions in the lattice. (The same general 
approach is also fruitful in the detection-of-means setting.) We emphasize that this result is quite 
straightforward given the work flow outlined above. The technical difficulties will come with its 
application to the context that interest us here, which will necessitate a good control of (17) below. 
Recall the definition (15). 

Proposition 1. Let {r(())) :(/>£$} be a elass of nonsingular eovariance operators and let C be a 
class of disjoint subsets ofV. Put the uniform prior n on C and let tt be a prior on <1>. Then 

I I cr^/7 


where 


f det{T-s\f,))det{T-\(j2)) \ 
[det{T^\cP,) + Ts\cP2)-ls)) 


and the expected value is with respect to drawn i.i.d. from the distribution tt. 


(17) 


3 Known covariance 

We start with the case where the covariance operator T is known. Although this setting is of less 
practical importance, as this operator is rarely known in applications, we treat this case first for 
pedagogical reasons and also to contrast with the much more complex setting where the operator 
is unknown, treated later on. 
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3.1 Lower bound 


Recall the definition of the minimax risk (3) and the average risk (14). (Henceforth, to lighten the 
notation, we replace subscripts in r(())) with subscripts in <p.) For any prior u on C, the minimax risk 
is at least as large as the i^-average risk, ^ > R*^, and the following corollary of Proposition 1 
provides a lower bound on the latter. 


Corollary 3. LetC be a class of disjoint subsets ofV and fix ip G satisfying ||</)||i < 1/2. Then, 
letting v denote the uniform prior over C, we have 


K,<p > 1 - 


2|C| 


J^exp 

.Sec 


Vl-2||^||i 


1/2 


(18) 


In particular, the corollary implies that, for any fixed o G (0,1), ^ > 1 — a as soon as 


ll^lli ^ . log(4|C|/a2) 

-=— < mm-^ . 

l-2||(/.||i - sec 10|5| 


(19) 


Furthermore, the hypotheses merge asymptotically (i.e., Rp, ^ 1) when 

log(|F|) - max I^I ^ oo . (20) 

1 — 2||(p||i See 

Remark 3. The condition ||</>||i < 1/2 in Corollary 3 is technical and likely an artifice of our 
proof method. This condition arises from the term det~^'^^(2r5((/i) — I5) in Vs in (17). For this 
determinant to be positive, the smallest eigenvalue of Tsi4‘) bas to be larger than 1/2, which in 
turn is enforced by ||i/>||i < 1/2. In order to remove, or at least improve on this constraint, we would 
need to adopt a more subtle approach than applying the Cauchy-Schwarz inequality in (16). We 
did not pursue this as typically one is interested in situations where (p is small — see, for example, 
how the result is applied in Section 5. 


3.2 Upper bound: the generalized likelihood ratio test 


When the covariance operator r((/)) is known, the generalized likelihood ratio test rejects the null 
hypothesis for large values of 

X]{1s-TsH^))Xs . 

o GC 

We use instead the statistic 


(Is - r-i(</.))Xs - Tr(Is - 


- 1 / 


U {X) = max 


5ec IIIs-r5i(<^)llF\/kgM+l|l5-r5n<(>)l|log(|C|) ’ 


( 21 ) 


which is based on the centering and normalization the statistics Xj (Is — Tg^{(p))Xs where S G C. 
In the following result, we implicitly assume that \C\ —)■ 00, which is the most interesting case. 

Proposition 2. Assume that (p G satisfies ||0||i < ?? < 1 and that IS^I > |5'|/2. The test 
f{x) = I{t/(x) > 4} has risk Rc,<f){f) < 2/|C| when 

Il<(>ll2min|5| > CologdC]) , (22) 


where Cq > 0 only depends on the dimension d of the lattice and r]. 

Comparing with Condition (20), we see that condition (22) matches (up to constants) the 
minimax lower bound, so that (at least when ||<^||i < 1/2) the normalized generalized likelihood 
ratio test based on (21) is asymptotically minimax up to a multiplicative constant. The £i-norm 
||iji||i arises in the proof of Corollary 3 when bounding the largest eigenvalue of r(^) (see Lemma 5). 
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4 Unknown covariance 


We now consider the case where the covariance operator r((^) of the anomalous Gaussian Markov 
random field is unknown. We therefore start by defining a class of covariance operators via a class 
of vectors (f). Given a positive integer h > 0 and some r > 0, define 

^h{r) ■■= {(p G II0II2 > r} , (23) 


and let 

(5{h,r):={T{(P):P)e^h{r)} , (24) 

which is the class of covariance operators corresponding to stationary Gaussian Markov Random 
Fields with parameter in the class (23). 


4.1 Lower bound 


The theorem below establishes a lower bound for the risk following the approach outlined in Sec¬ 
tion 2, which is based on the choice of a suitable prior vr on defined as follows. By sym¬ 
metry of the elements of one can fix a sublattice of size |N/i|/2 such that any (p G 
is uniquely defined (via symmetry) by its restriction to N'^. Choose the distribution vr such that 
~ TT is the unique extension to Nh of the random vector where the coordinates of 

the random vector ^—indexed by —are i.i.d. Rademacher random variables (i.e., symmetric ±1- 

valued random variables). Note that, if r\Nh\ < 1, vr is acceptable since it concentrates on the set 
{(p G \\(p \\2 = r} C 4>/i(r). Recall the definition of the minimax risk (5) and the average risk 
(15). As before, for any priors u on C and vr on <k/i(r), the minimax risk is at least as large as the 
average risk with these priors, the following (much more elaborate) corollary 

of Proposition 1 provides a lower bound on the latter. 


Theorem 1. There exists a constant Cq > 0 such that the following holds. Let C be a class of 
disjoint subsets ofV and let v denote the uniform prior over C. Let a G (0,1) and assume that the 
neighborhood size \Nh\ satisfies 


|Nh| < mm 




|g| 

|A2h(S)| 


'log-V6(|c|/a) 


(25) 


Then R* j,. > 1 — a as soon as 


r^inaxIRI < Co \/|N/,| log (|C|/a) \/ log (|C|/a) 

o GC 


(26) 


This bound is our main impossibility result. Its proof relies on a number auxiliary results for 
Gaussian Markov Random Fields (Section 7.3) that may useful for other problems of estimating 
Gaussian Markov Random Fields. Notice that the second term in (26) is what appears in (19), 
which we saw arises in the case where the covariance is known. In light of this fact, we may interpret 
the first term in (26) as the ‘price to pay’ for adapting to an unknown covariance operator in the 
class of covariance operators of Gaussian Markov random fields with dependency radius h. 


4.2 Upper bound: a Fisher-type test 

We introduce a test whose performance essentially matches the minimax lower bound of Theorem 
1. Comparatively, the construction and analysis of this test is much more involved than that of the 
generalized likelihood ratio test of Section 3.2. 
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Let Fi = : 1 < |u|oo < h), seen as a vector, and let F be the matrix with row 

vectors Fi,i G S^. Also, let Xg^h = {Xi : i G S^). Under the null hypothesis, each variable 
Xi is independent of Fi, although Xi is correlated with some {Fj,j ^ i). Under the alternative 
hypothesis, there exists a subset S and a vector cj) £ such that 


Xs,h = Fs,h4> + 5 


(27) 


where each component of es,h is independent of the corresponding vector Fi, but the Cj’s are not 
necessarily independent. Equation (27) is the so-called conditional autoregressive (CAR) represen¬ 
tation of a Gaussian Markov random held (Guyon, 1995). For Gaussian Markov random helds, the 
celebrated pseudo-likelihood method (Besag, 1975) amounts to estimating cj) by taking least-squares 
in (27). 

Returning to our testing problem, observe that the null hypothesis is true if and only if all 
the parameters of the conditional expectation of Xg^h given Fg^h are zero. In analogy with the 
analysis-of-variance approach for testing whether the coefficients of a linear regression model are 
all zero, we consider a Fisher-type statistic 


T* =ma^Tg , 
Sec 


\s’^\\\ng,hXs,hg 

\\Xg,h-Ug,hXg,h\\l ’ 


(28) 


where := F5 /i(FJ^F 5 /j)“^FJ^ is the orthogonal projection onto the column space of Fg^^. 
Since in the linear model (27) the response vector Xg^h is not independent of the design matrix 
F5/J, the statistic Tg does not follow an F-distribution. Nevertheless, we are able to control the 
deviations of T*, both under null and alternative hypotheses, leading to the following performance 
bound. Recall the definition (4). 


Theorem 2. There exist four positive constants Ci,C 2 ,C^,Ci depending only on d such that the 
following holds. Assume that 


|N;,|4v|N/,|2log(|C|) <Cimin|S^| . 


(29) 


Fix a and /3 in (0,1) such that 


og(JV 


(30) 


Then, under the null hypothesis, 

P {t* > |N;,| + C 3 [VlN/.KlogdCD + l + Ma-i)) + log(|C|) + log(a-i)] ]<a, (31) 

while under the alternative, 

p |r* > |N,| + C 4 \S^\ A - V^(l + log"(/3-')) } > 1 - /? ■ (32) 

In particular, if an, fdn 0 are arbitrary positive sequences, then the test f that rejects the null 
hypothesis if 


T* > |N;,|+G3 


\/|Nfe|(log(|C|) 1 -hlog(an^)) + log(|C|) -hlog(a„^) 
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satisfies Rc,&(h,r){f) —^ 0 as soon as 


r^> 


Co 


minsec 

where Cq > 0 depends only on d. 


y|Nft| (log(|C|) + log(^) + log®(^)) V log(|C|) y log(^ 


(33) 


Comparing with the minimax lower bound established in Theorem 1, we see that this test is 
nearly optimal with respect to h, the size of the collection |C|, and the size I^I of the anomalous 
region (under the alternative). 


5 Examples: cubes and blobs 

In this section we specialize our general results proved in the previous subsections to classes of 
cubes, and more generally, blobs. 


5.1 Cubes 


Consider the problem of detecting an anomalous cube-shaped region. Let £ G {1,..., m} and assume 
that m is an integer multiple of i (for simplicity). Let C denote the class of all discrete hypercubes of 
side length i, that is, sets of the form S = , bg-\-(. —1}, where hg G {1,..., ruTl—Each 

such hypercube S £C contains [S'! = k := nodes, and the class is of size |C| = (m — 1 — < n. 

The lower bounds for the risk established in Corollary 3 and Theorem 1 are not directly ap¬ 
plicable here since these results require subsets of the class C to be disjoint. However, they apply 
to any subclass C' C C of disjoint subsets and, as mentioned in Section 2, any lower bound on the 
minimax risk over C applies to the minimax risk over C. A natural choice for C here is that of 
all cubes of the form S = ns=i{®s^ + 1, • • •, (“s + 1)^}) where ag G {0,..., m/£ — 1}. Note that 
\C'\ = [mjiY = n/k. 


h bounded. Consider first the case where the radius h of the neighborhood is bounded. We may 
apply Corollary 3 to get 


> 1 - 


2nV2 


exp 


jkm_\ 
I-mil) 


For a given r > 0 satisfying 2|N/i|r < 1, we can choose a parameter constant over N/i such that 
\\4>\\2 = r and ||(?i||i = {2h + - 1. Since .Rc,®(h,r) - ^c,< 3 (h,r) ^ when 

n —>• oo, if {k,(j)) = (/c(n), </>(n)) satisfies log(n) k n and < log(n/A:)/(llfe). Comparing 
with the performance of the Fisher test of Section 4.2, in this particular case. Condition (29) is met, 
and letting a = a{n) —)> 0 and /3 = fi{n) —)• 0 slowly, we conclude from (33) that this test (denoted 
/) has risk Rc,is{h,r){f) 0 when > Colog{n)/k for some constant Cq. Thus, in this setting, 
the Fisher test, without knowledge of (j), achieves the correct detection rate as long as k < for 
some fixed h <1. 


h unbounded. When h is unbounded, we obtain a sharper bound by using Theorem 1 instead of 
Corollary 3. Specialized to the current setting, we derive the following. 


Corollary 4. There exist two positive constants Ci and C 2 depending only on d such that the 
following holds. Assume that the neighborhood size h is small enough that 


mi < Cl 


k 


loglVW2) (n) 


/\ log^/^ (^) d+2^d+2 logsdte 


(34) 
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Then the minimax risk tends to one when n —>■ oo as soon as {k,h,r) = (k{n),h{n),r(n)) satisfies 
n/k ^ oo and 

'log(f)^ , V|N;,|log(g) ' 

A: V 


<C2 


(35) 


Note that, in the case of a square neighborhood, \Nh\ = (2/i + 1)“^ — 1. Comparing with the 
performance of the Fisher test, in this particular case. Condition (29) is equivalent to |N/i| < 
Co{k^^^ A ^yk/\og{n)) for some constant Co- When k is polynomial in n, this condition is stronger 
than Condition (34) unless d < 5. In any case, assuming h is small enough that both (29) and (34) 
hold, and letting a = a{n) —)> 0 and fi = j3{n) —)• 0 slowly, we conclude from (33) that the Fisher 
test has risk Rc^\s{h,r) tending to zero when 


>Co 


log(^) \ / \/|Nfe| log(n) 

V h 


for some large-enough constant Cq > 0, matching the lower bound (35) up to a multiplicative 
constant as long as k < for some fixed b < 1. 

In conclusion, whether h is fixed or unbounded but growing slowly enough, the Fisher test 
achieves a risk matching the lower bound up to a multiplicative constant. 


5.2 Blobs 

So far, we only considered hypercubes, but our results generalize immediately to much larger classes 
of blob-like regions. Here, we follow the same strategy used in the detection-of-means setting, for 
example, in Arias-Castro et al. (2011, 2005); Huo and Ni (2009). 

Fix two positive integers £o < C and let C be a class of subsets S such that there are hypercubes 
So and S°, of respective side lengths £o and i°, such that So C S C S° . Letting Co and C° denote 
the classes of hypercubes of side lengths io and i° , respectively, our lower bound for the worst-case 
risk associated with the class C° obtained from Corollary 4 applies directly to C —although not 
completely obvious, this follows from our analysis—while scanning over Co in the Fisher test yields 
the performance stated above for the class of cubes. In particular, if io/i° remains bounded away 
from 0, the problem of detecting a region in C is of difficulty comparable to detecting a hypercube 
in Co or C°. 

When the size of the anomalous region k is unknown, meaning that the class C of interest 
includes regions of different sizes, we can simply scan over dyadic hypercubes as done in the first 
step of the multiscale method of Arias-Castro et al. (2005). This does not change the rate as there 
are less than 2n dyadic hypercubes. See also Arias-Castro et al. (2011). 

We note that when io/C = o(l), scanning over hypercubes may not be very powerful. For 
example, for “convex” sets, meaning when 

C = |s = A:nV:A:cM‘^ convex, \Kr\V\ = k^ , 

it is more appropriate to scan over ellipsoids due to John’s ellipsoid theorem (John, 1948), which 
implies that for each convex set K C M”*, there is an ellipsoid E C K such that vol(ii') > d~'^vol{K). 
For the case where d = 2 and the detection-of-means problem, Huo and Ni (2009) —expanding on 
ideas proposed in Arias-Castro et al. (2005) —scan over parallelograms, which can be done faster 
than scanning over ellipses. 
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Finally, we mention that what we said in this section may apply to other types of regular 
lattices, and also to lattice-like graphs such as typical realizations of a random geometric graph. 
See Arias-Castro et al. (2011); Walther (2010) for detailed treatments in the detection-of-means 
setting. 


6 Discussion 

We provided lower bounds and proposed near-optimal procedures for testing for the presence of a 
piece of a Gaussian Markov random field. These results constitute some of the first mathematical 
results for the problem of detecting a textured object in a noisy image. We leave open some 
questions and generalization of interest. 

More refined results. We leave behind the delicate and interesting problem of hnding the exact 
detection rates, with tight multiplicative constants. This is particularly appealing for simple settings 
such as finding an interval of an autoregressive process, as described in Section 1.3. Our proof 
techniques, despite their complexity, are not sufficiently refined to get such sharp bounds. We 
already know that, in the detection-of-means setting, bounding the variance of the likelihood ratio 
does not yield the right constant. The variant which consists of bounding the first two moments of 
a carefully truncated likelihood ratio, possibly pioneered in Ingster (1999), is applicable here, but 
the calculations are quite complicated and we leave them for future research. 

Texture over texture. Throughout the paper we assumed that the background is Gaussian white 
noise. This is not essential, but makes the narrative and results more accessible. A more general, 
and also more realistic setting, would be that of detecting a region where the dependency structure 
is markedly different from the remainder of the image. This setting has been studied in the context 
of time series, for example, in some of the references given in Section 1.3. However, we are not 
aware of existing theoretical results in higher-dimensional settings such as in images. 

Other dependency structures. We focused on Markov random fields with limited neighborhood 
range (quantihed by h earlier in the paper). This is a natural first step, particularly since these are 
popular models for time series and textures. However, one could envision studying other dependency 
structures, such as short-range dependency, defined in Samorodnitsky (2006) as situations where 
the covariances are summable in the following sense 

sup iDjI < OO . 


7 Proofs 

7.1 Proof of Proposition 1 


The Bayes risk is achieved by the likelihood ratio test f*.j^{x) = I{L^^t^{x) > 1} where 


Lu,7r{x) = 


|C| 


X) = 


f dIPs,r(fli)(a;) 


See 


dPo(a;) 


TT 


{dfi) 


In our Gaussian model, 

Ls{x) = E, 


exp K 


^^s(Is - - |logdet(r5((^))^ 


where the expectation is taken with respect to the random draw of ~ tt. Then, by (16), 


= 1 - ^Eo|L.,.(X) - 1| > 1 - ^^Eo[L,,,(X)2] - 1 . 


1 


(36) 


(37) 
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(Recall that Eq stands for expectation with respect to the standard normal random vector X.) 

We proceed to bound the second moment of the likelihood ratio under the null hypothesis. 
Summing over S,T £ C, we have 

Eo[L,AXf] 

= ^ E MLsiX)LT{X)] 

' ' S,T£C 


^ ^ Eo[Ls{X)] Eo[Lt{X)] + ^ E MLUX)] 

' ' S^T ' ' See 

(xj(is - (</>!) - lTs\h))Xs 


< 1 + 


|C|' 


E^Eti 


exp 


-5logdet(r5((/>i)) - i log del 
( - i logdet(rgi(,/.i) + T-g\<P2) -Is)-I logdet(r5(</>i)r5(</)2))) 


1+|C|2E^5, 


where in the second equality we used the fact that S ^ T are disjoint, and therefore Ls{X) and 
Lt{X) are independent, and in the third we used the fact that Eo[Rs(-^)] = 1 for all S G C. 


7.2 Deviation inequalities 

Here we collect a few more-or-less standard inequalities that we need in the proofs. We start with 
the following standard tail bounds for Gaussian quadratic forms. See, e.g., Example 2.12 and 
Exercise 2.9 in Boucheron et al. (2013). 

Lemma 1. Let Z he a standard normal vector in and let R 6e a symmetric dx d matrix. Then 

E jz’^RZ - Tr(R) > 2||R||i;’\/t + 2||R||t | < e~\ Vt > 0 . 

Furthermore, if the matrix R is positive semidefinite, then 

E jz’^RZ - Tr(R) < -2||R||ir\/t | < e"*, Vt > 0. 

Lemma 2. There exists a positive constant C such that the following holds. For any Gaussian 
chaos Z up to order 4 and any t > 0, 

e||Z -E[Z]| > C'Varfo2(z)i2| < _ 

Proof. This deviation inequality is a consequence of the hypercontractivity of Gaussian chaos. More 
precisely. Theorem 3.2.10 and Corollary 3.2.6 in de la Pena and Gine (1999) state that 


Eexp 


Z - E[Z] 
CVarfo2(^) 


1/21 


<2 , 


where C is a numerical constant. Then, we apply Markov inequality to prove the lemma. □ 
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Lemma 3. There exists a positive constant C such that the following holds. Let F be a compaet 
set of symmetrie r x r matrices and let Y ~ A/'(0,Ir). For any t > 0, the random variable 
Z := supj^gpTr [RyF"''] satisfies 

P{Z>lE(Z) + t}<exp(-a(^A|)). (38) 

where W := Tr(Ryy'''R) and B := supj^gp IIR-II- 

A slight variation of this result where Z is replaced by supj^g^z Tr [R(yy''^ — R)] is proved in 
Verzelen (2010) using the exponential Efron-Stein inequalities of Boucheron et al. (2005). Their 
arguments straightforwardly adapt to Lemma 3. 

Lemma 4 (Davidson and Szarek (2001)). Let W be a standard Wishart matrix with parameters 
(n, d) satisfying n> d. Then for any number 0 < x < 1, 

P |a”'"^(W) >n(l + y/djn + \/2x/n)^| < , 

p|a““(W) <n(l-V^-< e-^ . 


7.3 Auxiliary results for Gaussian Markov random fields on the lattice 

He we gather some technical tools and proofs for Gaussian Markov random helds on the lattice. 
Recall the notation introduced in Section 2.1. 


Lemma 5. For any positive integer h and 0 G <h/j with ||(/)||i < 1, we have that if X is an eigenvalue 
of the covariance operator then 






< A < 




Also, we have 


UWl < < 


ml 

1 + m\i ^ ^ 1 - ii<?^iii 


1 - 


and 


1 - 


ii < < 1 


(39) 


Proof. Recall that || • || denotes the —)• operator norm. First note that by the dehnition of (f, 

a‘p^~^{(t)) - I = and therefore 

||a2r-H<())-I||<||<()||i, (40) 

where whe used the bound ||A|| < supjggd 1-^*11- This implies that the largest eigenvalue of 


r((/)) is bounded by ct^/(1 


if ||i?i||i < 1 and that the smallest eigenvalue of r((/)) is at least 


cr^/(l + ||<?i>||i). Considering the conditional regression of 1) given y_i mentioned above, that is, 

Yi — ^ ^ T 

l<\j\oo<h 

(with €i being standard normal independent of the Yj for j ^ i) and taking the variance of both 
sides, we obtain 


1 — ai = Var 






= </.'r(</.)<?i<||r(</.)||||,/.||^< 


1 - 


-CXa 
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and therefore 


1 


Rearranging this inequality and using the fact that \\<i>\\^ < ||(/>||f < ||0||i, we conclude that > 


1 — ||</>||i. The remaining bound is obtained similarly. 


□ 


Recall that for any u G Z'^, 7 ^, is the correlation between Yi and and is therefore equal 
to Ti^ij^y. This definition does not depend on the node i since F is the covariance of a stationary 
process. 

Lemma 6. For any h and any (j) £ ^h, let Y ~ AA(0, r((/i)). As long as ||(/>||i < 1, the I 2 norm of 
the correlations satisfies 


E 


72 < 

!v — 


+ 


|2n-2 

\2^,P 


(l-||<^||l)2 I (l-||<^||l)2 


(41) 


Proof. In order to compute || 7 |||, we use the spectral density of Y defined by 

fioji,... ,u!d) = ^ 7i;i,...i;dexp j , (wi,... ,a;d) G (-7r,7r]'^ . 

Following (Guyon, 1995, Sect. 1.3) or (Rue and Held, 2005, Sect.2. 6 .5), we express the spectral 
density in terms of and 


1 _ 

f{uJl,...,UJd) (tI 


1 - ^ 

v.,l<\v\oc<h<^'L^ 


where (•, •) denotes the scalar product in M'^. As a consequence. 
Relying on Parseval formula, we conclude 


E^^ 


= (2vr) 


-7r;7rj' 

,4 


/(Wl, ...,UJd)- 


1 l2 


< 


< 


< 


< 


< 


a 


{2-kY 

1 


doJi ... dujd 


- 1 


(27r)'^(l - \\4>\\iY (2vr)'^/(a;i, ...,u}d) 


didi ... dud 


a A 






Ar,‘Aj) 


v^l<\v\oQ<hG.'L^ 


dcui ... dud 


a A 


(2vr)'^(l - \\4>\\iY 
2 

+ 


+ E 


{ 21)72 


'»d<l'y|oo ^ 


1 - 


I-Uh 




+ 


(l-||<^||l)2 / ' (1-||^||,)2 ’ 
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where we used (39) in the last line. 


□ 


Lemma 7 (Conditional representation). For any h and any (p G let Y ~ AA(0, r((/))). Then 
for any i G the random variable ei defined by the conditional regression Yi = cp^Yi+y + a 

satisfies that 

1. Ci is independent of all Xj ,jfii and Cov{ei, Xi) = Var(ej) = a‘^. 

2. For any i j, Cov(ei, ej) = —(pi-ja'^ if \i — j\oc < h and 0 otherwise. 

Proof. The first independence property is a classical consequence of the conditional regression 
representation for Gaussian random vectors, see, for example, Lauritzen (1996). Since Var(ej) is 
the conditional variance of Yi given Y^~'^\ it equals [(r~^(^))j^j]“^ = Furthermore, 


Cov(ei,yi) = Var(ei) + ^ (pj Cov(ei, 1^*+.,;) = Var(ej) , 

by the independence of e* and Y^~'^'l. Finally, consider any i j, 

Cov(€i,ej) = Cov(ei, Yj) - ^ (py Cov(ei,Yj+y) , 

vGNh 

where all the terms are equal to zero with the possible exception of u = i—j. The result follows. □ 


Lemma 8 (Comparison of F ^{(p) and r^^((/))). As long as ||(?!)|| 
hold: 

1. Ifi£ or iff G S'^, then {Tg^{(p))ij = {T~^{(p))ij. 

2. Ifi G and j G Ah{S), then 1 < {rg^{(p))jj < (Tg^{(p))i^i. 


< 1, the following properties 


3. If i ^ I^h{S), then — (i-||<^||i)3' 

Proof. We prove each part in turn. 

Part 1. Consider i £ and any j G S. By the Markov property, conditionally to (li+fc, 1 < 
l^loo < h), Yi is independent of all the remaining variables. Since all vertices i + k with 1 < |A:|oo < h 
belong to S, the conditional distribution of Yi given y is the same as the conditional distribution 
of Yi given {Yj,j G S' \ {f}). This conditional distribution characterizes the i-th row of the inverse 
covariance matrix F^^. Also, the conditional variance of Yi given Y^' is [(F ^{(p))i,i] ^ and the 
conditional variance of T) given Ys is [(F^^((^))j^ 4 ]“^. Furthermore, —(F^^((/)))jj/(F~^((/)))j_j is 
the j-th parameter of the condition regression of Yi given y(*\ and therefore we conclude that 
{T-\cP))i,i = (a2)-i = (F 51 (</>))m and {T-\4>))i,,/{T~\<P)\i = -cfi., = {T-g\cP))iJ{T-s\cP)U. 

Part 2. Consider any vertex i £ and j £ A/j(S). Since 1/Tg^{(p))jj and 1/Tg^{(p))jj are 
the conditional variances of y and Yj given Yk,k £ S \ {j} and Yk,k £ S\ {z}, respectively, we 
have 


> 


YaT{Yj\Yk:k£ S\{j}) 

Var (yj|y(-^)) 

Var (yj|y*^“*^) (by stationarity of Y) 

Var(y|y ;feGS\{z}) (since the neighborhood of i is included in S) 
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Part 3. Consider i E Ah{S). The vector (r_5^((/)))i _j 
formed by the regression coefficients of Yi on {Yj,j € S \ {i}). Since the conditional variance of Yi 
given {Yj,j E 5 \ {i}) is at least cr^ (by Parts 1 and 2), we get 



> 


> 


1-Var(yi|y,- : j€S\{i}) 
Var(E{y,|y,-, jeS\{i}} 

Var f V 

4Var( 

Jes\{i} 


^ + U\ 


where the equality in the second line above we use Var(Y^ ) = 1 and the law of total variance (i.e., 
Var(y) = K[Var{Y\B)] + Var{E[Y\B])) and in the last line we use that the smallest eigenvalue of 
r((/)) (and also of r5((/))) is larger than cr^/(1 + ||0||i) (Lemma 5). Rearranging this inequality and 
using the fact that ||(/>||i < 1, we arrive at 


(r5n</>)k- 


i \\2 — 


I - (Tj 


O'j 




1 - O'? 




< 


< 




m\i 


(by (39)) 


(using Lemma 5). 


Lemma 9. For any (pi, ^2 £ ^h, define 


f det(r^^(cPi))det(r^^(P>2)) \ 

[det(rs\M + r^\c^2 )-is)J 


□ 


(Note that Vs defined in Proposition 1 equals the expected value of when (fi and (/)2 are drawn 

independently from the distribution n.) Assuming that ||(/)i||i V ||<?i>2||i < 1/5, we have 

< ^\S\{(f)i,(f 2 ) + 8Qs , 


where 

Qs 


2 

1*51 E I E Ps,sPs,,k(Ps,,k-j\ + i5\smi\\iyu^^^^ 

^ 1 ,* 2 , 53=1 jjkG'Mh 


+ 28|A2/.(5)| {\A2h{S)\ V (|N;,| + l)f/^ iWMl V mil) . 


20 









Proof. Since for any cf, the spectrum of lies between the extrema of the spectrum of T 

by Lemma 5, we have 




<71 


- 1 < {r-/{cp) - 15) < - 15) < 


-1 


(Ja 


where A™“(A) and A™^^(A) denote the smallest and largest eigenvalues of a matrix A. Since 
crl < Var (1^) = 1, the left-hand side is larger than —||(^||i, while relying on (39), we derive 


1 + UWi 

„2 


l<(ll<(>lll + l) 


ml ' 


- 1 < 


mil 


Consequently, as long as ||(/>||i < 1/5, the spectrum of r^^((/)) lies in (|, |). This allows us to use 
the Taylor series of the logarithm, which for a matrix A with spectrum in (2,2), gives 


log(det(A)) -Tr[A-l 5 ] T^Tr 





Tr 


(A - 15 ) 



Applying this expansion to rg^((/i), r^^((/2) and -|- rg^(i/)2) — I5, 


2 log< Vi + ^V 2 + 8 V 3 + 8 U 4 , 

Ui := Tt[(r^'(0i)-l5)(r5'(02)-Is)] , 


1^2 : = 

Tr 


+ 

Tr[(r5n02)-l5)'' 

5 

^3 : = 

Tr 

‘(r-i( 0 i) - Is) (r5i(02) - Is) {rs\h) - 15 )] 

U 4 : = 

Tr 

\t,\^ 2 ) - Is) (rs'(0i) - 15 ) (r5H02) - Is)] 


Control of Vi. We use the fact that 


Tr 




((^S i^2))i,j Si,j) ■ 

i,jeS 


To bound the right-hand side, first consider any node z G 5^ in the /i-interior of S. By the first part 
of Lemma 8, the i-th row of equals the restriction to S of the z-th row of r“^(i/)). Using 

the definition of (pi,(j) 2 , we therefore have 


E ((rs‘{'i>i))i,i - ki) - ki) 

j&S 


< 


(!-<)(!-^^ 2 ) 

0. 2 ' 2 2 

VI2 VI2 


( 01 , 02 ) + ( 01 , 02 ) 


^ - V 


2 

</>2 


^ll^l2 



( 01 , 02 ) + 


3 II01II2 + II02II2 

2(1-||0i||i)(1-|02||i) 


(42) 
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using Lemma 5 in the last line. Next, consider a node i G Ah{S), near the boundary of S. Relying 
on Lemmas 5 and 8, we get 


((^5 ^i,j) — 

j&S 


’1||2 


< 


(1 - ll</>illi)^ 


5l|l2 


+ - l)' 


+ 


hWi . 3P-II2 


< 


'’1112 


since we assume that ||i?i||i < 1. By the Cauchy-Schwarz inequality, 


(43) 




(44) 


Summing (42) over i £ and (44) over i G Ah{S), we get 


Ri<|S|(<^i,02) + -|5| 


3.^. Il'/’l|l2 + II'(’2||2 ,oiA /OM ll'/’llli V ||</>2||2 


2' '(l-||^i||i)(l-||</,2||i) 


+ 3|A;,(5)| 


(1 - ||(/>l||l V ||(()2||l)^ 


Control of 4^. We proceed similarly as in the previous step. Note that 


Tr 




i,j,kGS 


S E 

ies 


Y, {i^~sH<Pl)kj - kj) k^~s\k))j,k - 5i,A:) ((r5'(<()l))fc, - 4,) 

j,k£S 


First, consider a node i in 5 \ A 2 h{S). Here, we use A 2 h{S) instead of A/i(S) so that we may 
replace Tg^{(j)) below with r^^((/)). We use again Lemma 8 to replace (F^^^(</>))by (r~^((/i))jj i 
the sum 


m 


('^l))i,fc ^j,k){(^S i4’l))k,i k,i) 

j&SkeS 


< 


< 


E 

j,k&^h 

E 

j,k£^h 


4^1, j 4l,k4l,k—j 


a 


9^1 






<i>i 


9^1 


4l,j 4l,k4l,k—j 


( 1 - 




+ 4 


Hi 


( 1 - 


lllllj 


using Lemma 6 in the last line. Next, consider a node i G A 2 hiS). If i ^ A/i(5), then the support 
of (Fg^((/)i))j^_j is of size |N/i|. If i G Afi{S), then A2/i(5') \ {i} separates {i} from S \ A 2 h{S) in 
the dependency graph and the Global Markov property (Lauritzen, 1996) entails that 


Hi ±{Yk, kGS\ A2h{S))m, k G A2hiS) \ W) 
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and therefore the support of is of size smaller than |A2h(5')|. Using the Cauchy- 

Schwarz inequality and (43), we get 


j(^S k&S 

< -5^ 

jes 

< v'|A2h(5)|v(|N;,| + i)||(r5i(<(>i))v - 5i,.|y|(r^H</>i))v - 

i6S 

In conclusion, 


^ 1^11 Ej,fcg% + I Ej,fce% ^2,j(f>2,k4>2,k-jl ^ 


l|l2 


V 


-2|li 


(1-||<^i||iV||<(>2||i)3 

+11|A2,.(S)|(|A2„(5)|v(|N;,| + 1))1/2 


(1-||<^i||iV||02||i)3 


mi V 


2112 


( 1 - 


mi 


v||<^2||i)9/2 • 


Control of V 3 + V 4 . Arguing as above, we obtain 

^ < I I Ei,fcg% 4>l,j(l^l,k(l)2,k-j\ + I Ei,fcg% 4>l,j(pl,k(l)2,k-j\ \\cPi\\l V \\cj)2\\l 

' - ' ' (1-||<^i||iV||</)2||i)3 + ' '(1-||<^i||iV||</-2||i)3 

+ ll|A 2 »(S)|(|A 2 ft(S)| V (IN^I + 1))‘-'^ 'lu'li ((lull ' 9/2 ■ 

□ 


l+ll<Alll 


7.4 Proof of Corollary 3 

As stated in Lemma 5, all eigenvalues of the covariance operator lie in (1 — ||<^||i, ). 

Since the spectrum of lies between the extrema of the spectrum of r~^(</>), and using the 

assumption that ||(/)||i < 1/2, this entails 


2||<ii||i ||<(-|| 


||r,W-1,11 < max 

We now apply Proposition 1 with the probability measure 


< 1 , 


(45) 


Ps = 


-- (<(>)) - ^ _ y 

det(2rgy(/) - Is)V2 


TT concentrating on (j). In this case. 


and we get 


K4> > 1 


> 1 - 


> 1 - 


1 


2|C| 

1 


^det(Is-(l5-r5(</>))y-'/2 

.5eC 


2|C| [K1.2(1-lirsW-nil) 

1 


I|r5(0)-i5iy 


1/2 


1/2 


2|C| 


^exp 

.5eC 


|r5(^)-i5||| 

2(l-2||</.||i) 


1/2 
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where || • \\f denotes the Frobenius norm. The second inequality above is obtained by applying the 
inequality 1/(1 — A) < for 0 < A < 1 to the eigenvalues of (r5((;/i) — Is)^, while the third 

inequality follows from (45) and the fact that ||(/>||i < 1/2. It remains to bound ||r5((/i) — IsH^: 

CorHYuY,) 

v^O 

2o\sml, 

where we used Lemma 6, cr^ < 1, and ||</>||2 < ||</’||i < 1/2 in the last line. 

7.5 Proof of Theorem 1 

Recall the definition of the prior tt defined just before the statement of the theorem. Taking the 
numerical constant C in (26) sufficiently small and relying on condition (25), we have ||(/||i = 
ry/Wh <1/5. Consequently, the support of tt is a subset of the parameter space and we are in 
position to invoke Lemma 9. 

Let (/>i, 4)2 be drawn independently according to the distribution vr and denote by and ^2 the 
corresponding random vectors defined on N/. By Lemma 9, 

< |5|r2N^i(ei,6)+8gs , 


Tsi4)-lsfF = 
< 

< 


where 

Qs < 23|5|r3vW + |A;,(5)|r2 + 28|A2,.(-S)|(|A2/,(5)| V (|N;,| + 1))^/V . 

Since (^1,^2) is distributed as the sum of |N/i|/2 independent Rademacher random variables, we 
deduce that 

/„2| Cl \ |f^hl/2 

Rs < cosh(^^J exp(383(|5|^N;/V|A2/,(5)|3/V + 8|Afc(5)|r2j 
- exp(^/\'^ + 383(|5|vWv|A2,(5)|3/2) + 8|A;,(5)|r2^ , 

since cosh(x) < exp(x) A exp(x^/2) for any x > 0. Combining this bound with Proposition 1, we 
conclude that the Bayes risk R* ^ is bounded from below by 


1 - 


,_max exp , , 

2^/\C\ see V4|Nh| 


(\3 


2^4 


A^+383 


+ 1 V 


r^ + 8|A;,(5)|r^ . (46) 


If the numerical constant C in Condition (26) is sufficiently small, then A A 0.5 log(|C|/a). 
Also, choosing Cq small enough in condition (26), relying on condition (25) and on |N/i| > 1, we 
also have 

383 (|5|vInATi V |A2A5)p/") r3 + 8|AA5)|r2 < 0.5log(|C|/a) . 

Thus, we conclude that R* ^ > 1 — o. 
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7.6 Proof of Corollary 4 

We deduce the result by closely following the proof of Theorem 1. We hrst prove that < 1 

is satisfied for n large enough. Starting from (35), we have, for n large enough. 


where we used Condition (34) in the second line. Taking Ci and C 2 small enough, we only have to 
bound |N/ip/^-yiog(^)/A:. We distinguish two cases. 

• Case 1: |N/i| < log(n/A:). Since |N/i| < Ci/c/log (^), it follows that |N/i|^/^y^log(^)/A: < Ci. 

• Case 2: |N/i| > log(n//c). Then the second part of Condition (34) enforces log^/^(n/A;) < 
C\k^l'°. Using again the second part of Condition (34) yields 


Sri/jNftJ < 5C. 


l/3/^|Nfe|log(g)^^|Nfe|3/2yi^ 


V 


iN^y^yEid) , ^3/2iog‘''={n/t), ^3/2 

k *2/5 


As SryiNft < 1, we can use the same prior tt as in the proof of Theorem 1 and arrive at the 
same lower bound (46) on iZ*. It remains to prove that this lower bound goes to one, namely that 

^iNfel' + (|5'|\/|Nfe| + 1 V |A2ft(S')|^/^) + 16\A2hiS)\r‘^ - ^log(n/fc) ^ -00 , 

where S' is a hypercube of size k. Taking the constant C 2 small enough in (35) leads to ^ 

log(n/A:)/4 for n large enough. 




log(n//i:)^|N//| y / log(n/A:)^/2|Nh|^/2 

h y h 


11/2 


< V dy\ log(n/A;) 


where we used again the second part of Condition (34). Taking C\ and C 2 small enough ensures 
that 765A:r^y^|N/i| + 1 < log(n/fe)/8 for n large enough. Finally, it suffices to control |A2/i(S)|^/^r^ 
since |A2/i(S)|r^ < |A2/i(S)|^/^r^ V 1. Observe that 


I A2fe(S)| = 1 ^ -{i- Ahy = [1 - (1 - Ah/id < M^dh/i < 4d|N/,|i/'^feAr 

It then follows from Condition (35) that 


{d\nhd'^kdf/^r^ < C2/2 


'^3/2|p^^|3/(2d) 

k3/{2d) 


>w^(Dv 


n\ \ I d3/2|I^^|3/(2d)+3/4 
A;3/(2<i) logV4 


log ( Y 


< cT 


^3/(2d)^3/2 log-1/4 Y Q-^ 


6+3d 

4(i 




where we used again (34) in the second line. Choosing Ci and C 2 small enough concludes the proof. 
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7.7 Proof of Proposition 2 

We leave (j) implicit throughout. Define 


U's = X]{ls - T,^)Xs - Tr(l5 - r~i). 

Under the null, X is standard normal, so applying the union bound and Lemma 1 gives 

1P{C/ > 4} < {[/^ > 4||l5 - r^ill^v/logp) + 4||l5 - r^^ll log(|C|)} < |C|-1. 

5ec 


Under the alternative where S' G C is anomalous, Xs has covariance so that we have 
X]{ls-T,^)Xs Z''^(r5—I^)^, where Z is standard normal in dimension |S|. Since Var(li) = 1, 
the diagonal elements of r5 — I5 are all equal to zero. We apply Lemma 1 to get that 


(I5 - Ts^)Xs < -2||r5 - l5||F\/log(|C|) - 2||rs - IsW log(|C|)l < |C| 


1-1 


In view of the definition of U, we have P[C/ > 4] > 1 — |C| ^ as soon as 

Tv[Ts^ - Is] > 4[||r5 - I5IIF V Iir^i - i^IIf] v'bg® + 6[||r - 1 || v nr-^ - 1 ||] iog(|C|). (47) 

Therefore, it suffices to bound HF^ — IsHf, ||Fg^ — IsHf, ||F — I||, ||F“^ — I|| and Tr[Fg^ — I 5 ]. In 
the sequel, the C denotes a large enough positive constant depending only on r/, whose value may 
vary from line to line. From Lemma 6, we deduce that 

||r5-Is||F<C|S|||</.||2 . 


Lemma 5 implies that 
We apply Lemma 8 to obtain 


IF-III V ||F-^ - III < C . 


||F^'-l5||| < C|S|||<^i + |S|(u^^-l)2 

< c\smi, 

where we used Lemma 5 in the second line. Finally, we use again Lemmas 8 and 5 to obtain 


Tr[F5i-l5] = |S"|^^+ ^ 


^ j&AUS) 

> |S"|^^>C|S|||<()||i. 


Consequently, (47) holds as soon as IS'lUt^iHl > Clogd^l). 


7.8 Proof of Theorem 2 

We use C, C, C" as generic positive constants, whose actual values may change with each appear¬ 
ance. 
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Under the null hypothesis. First, we bound the 1 — a quantile of T* under the null hypothesis. 
Denote Zs := so that Tg = — Zs]~^. Since Zg is the squared norm of 

the projection of Xg^h onto the column space of Fg^h, we can express Zg as a least-squares criterion: 

Zg = max WXg^hWl - X] “ Z] ■ 

Given cj) G , define the matrix B 0^5 G such that for any i G S"^, and any j, = (pj, 

and all the remaining entries of B,^ 5 are zero. It then follows that 


Zg = max Tr 


T 




Rfi 


,s ■= 


lg-Bls){lg-B^,g)-ls 


(48) 


Observe that Zg can be seen as the supremum of a Gaussian chaos of order 2 . As the collection of 
matrices in the supremum of (48) is not bounded, we cannot directly apply Lemma 3. Nevertheless, 
upon defining defining Zg := max|| 0 |q<i Tr , we have for any t > 0, 


^[Zg > t] < ^[Zg > t] -I- P[Z5 ^ Zg] , 


(49) 


and we can control the deviations of Zg using Lemma 3. Observe that for any cp with ||(/>||i < 1, 
Ills' — B,ji^s|| < 2 , so that ||R,^_s|| < 3. Choose (pg among the (p’s achieving the maximum in (48), 
and note that ^[Zg / Zg] = P[||(/)s||i > 1]. We bound the right-hand side below. In view of 
Lemma 3, we also need to bound E[.^s] and E [sup||^||j<;^ Tr(R0^sAis-^S ™ order to control 

nzs>t]. 

Control o/P[||0s||i > 1]. When Fj;^Fs,/i is invertible, ^g = (F]I^^Fs,h)"^Fs,hAs,/i. By the Cauchy- 
Schwarz inequality, 


i>s\\i > 1 ] < 
< 

< 


P[||0S||2 > |%|-l/2] 
A““(fJ,Fs,,) < ilS'^l 
A““(fT,Fs,.) < i|S"| 


+ 1 
+ 1 


|Fs,/i-’^S,h ||2 > 
|Fs,/iAs,h||oo > 


]S^ 


2|N;,|V2 

■ 

m^\_ 


(50) 


First, we control the smallest eigenvalue of Fj^Fs,h- Under the null hypothesis, the vectors Fi 
follow the standard normal distribution, but FJ^Fs^/^ is not a Wishart matrix since the vectors Fi 
are correlated. However, Fg^Fs,h decomposes as a sum of |N/i| -|- 1 (possibly dependent) standard 
Wishart matrices. Indeed, define 


Si = n {i + {2h + l)u, u G Z'^}, i G Nfe U {0} , 


(51) 


and then Aj = '^j^g.FjFj. The vectors {Fj, j G Si) are independent since the minimum £00 
distance between any two nodes in Si is at least 2h + l, so that Aj is standard Wishart. Denoting 
Hi = |5i|, we are in position to apply Lemma 4, to get 


A““(Ai) <ni- 2v^|N/i|ni - 2yj2xni 


< e 


Vx > 0 . 


Since the {5, ; i G U {0}} forms a partition of S^, we have F~gj^Fg^h = X^ieN^ulo} 
in particular, Amin(FjijF 5 ^ft) > X)jAmin(Ai). Using this, the tail bound for A™“(Ai) with x •(— 
X + log(|N/i| -|- 1), some simplifying algebra, and the union bound, we conclude that, for all x > 0, 


A““(F^^,F 5 ,,) < ]S'^] - mh\ + l)^\S^\ - 3j(|N,| + l)|5^|x 


< 


(52) 
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since 


^ (ni-2yiN;j rii — 2\/2xn^ > ^ rii — 2(v1^ + V2x) ^ 

ieNhU{o} ieNhU{o} ieNhU{o} 

with I]igN;,u{ 0 } and Z]ieN;,u{o} - Vl'S'^l (|N/i| + 1), by the Cauchy-Schwarz inequal¬ 

ity. Taking x = C'|S'^|/(|N/i| -|- 1) in the above inequality for a sufficiently small constant C and 
relying on Condition (29), we get 

p{a-"(fT,fT J < exp (-C|5"|/(|N,| + 1)) . 

We now turn to bounding HF^^/jX^ /iHoo. Each component of Ys^h^s^h is of the form := 
Ylies^ ^i^i+v for some v G Nh- Note that is a quadratic form of l^l standard normal variables, 
and the corresponding symmetric matrix has zero trace, Frobenius norm equal to y^|S'^|/2, and 
operator norm smaller than 1 by diagonal dominance. Combining Lemma 1 with a union bound, 
we get 

P |||F5,/.X5,h||oo > y^2|S^|(x + |N;,|) + 2{x + |Nfc|)| < 26"" , Vx > 0 . 

Taking x = C'|5^|/|N/ip in the above inequality for a sufficiently small constant C and using once 
again Condition (29) allows us to get the bound 

p{||F5,/^X5,,||oo > < exphCl^'^l/lN.p] . 

Plugging these bounds into (50), we conclude that 



Control ofK[Zs]- Since 

Zs<Zs = \\ns,hXs,hf 2 < ll(Fl,,,F5,h)"il|Fs,;,X5,h||i < WXsMll , 

we have, for any a > 0, 

E[Z5] < aE [||F5,hX5,h||i] +E [||X5,/^||p{||(FT;,F5,,.)-i > a} 

< aE[||F5,hX5,h||i] +yp{||(FT^F5,/^)-i|| >a}E[||X5^ 

where we used the Cauchy-Schwarz inequality in the second line. Since, under the null, Xs ~ 
AA(0, Is), it follows that E [||F 5 ^/,X 5 ^,,|||] = |Nft(5')||5^| and E [||X 5 _/j|||] = |5'*|(|S’''|-k2). Gathering 
this, the deviation inequality (52) with x = CIS^I/lN/ip with a small constant C > 0, and Condition 
(29), and choosing as threshold a = (|S'^|(1 — leads to 


hWi] , 


E[Zs] < 


m\ 


+ V3\S'^\^¥[x-^^{Fl^Fs,h) < 1/a} 


1-|N,,|-V2 

< |N,|+C'|N;,|fo2 + V3|5'^|exp(-C^) 

< | N ;,|+ C | N ,,|^/2 _ 


( 54 ) 
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Control o/E [sup||^||j<;^ Tr(R^^ 5 X 5 XjR^^ 5 )]. As explained above, ||R<^^ 5 || < 3 and we are there¬ 
fore able to bound this expectation in terms of E[Z 5 ] as follows: 


E 


sup Tr(R^^sXsXs'R-cf>,s) 
Il0lli<i 


<3E[Z5] <C|N;,| , 


(55) 


where we used (54) in the last inequality. 

Combining the decomposition (49) with Lemma 3 and (53), (54) and (55), we obtain 
p{zs> |N;,|+c(|N;,|i/2 + ^/^ + i)| <e-* + 3exp(^-C'^) , Vt > 0 . 

Since 


Ts = 


IS’^lZs 


\X^^h\\?;-Zs' 


‘■S,h|l2 

from Lemma 1, we derive 


where ||A^s,/i ||2 follows a distribution with \S^\ degrees of freedom 


\\Xs,hM>\S^\-2j\S^^\t-2t 


< e 


-t 


for any t > 0, and from these two deviation inequalities, we get, for all t < 


Ts > 


+ C'(|N;,|V2 + ^,^^ + i) 


1-C 


\Sh\ ^ |5^| 


< 2e * + 3 exp 


-C 


|N;,|2 


Finally, we take a union bound over all S' G C and invoke again Condition (29) to conclude that, 
for any t < C"|S'^|, 

E jmaxrs > |N;,| + C (V|N;,|(log(|C|) + 1 + t) + log(|C|) + t) } < 26"* + 3|C| exp ("^'(0) • 

To conclude, we let t = log(1/(40;)) in the above inequality, and use the condition on a in the 
statement of the theorem together with Condition (29), to get the following control of T* under 
the null hypothesis: 

P > |N/i| + C (^\/|N/i|(log(|C|) + 1 + log(Q;-i)) + log(|C|) + log(a“^)^ | < a • 


Under the alternative hypothesis. Next we study the behavior of the test statistic T* under 
the assumption that there exists some S G C such that Xs = Is ~ AA(0, rs(((>)). Since T* > Ts, 
it suffices to focus on this particular Ts- For any i G S^, recall that U = (f>^Fi + ei where 
Fi = (U+^ : 1 < |u|oo < ^) and e* is independent of Fj. Hence, Zs decomposes as 

Zs = ||ns,/iTs,ft ||2 

= II Fs,/!</> +ns,/ies,/ill i 

= ||Fs,/i</>||i + 24>'^Flf,es,h + ||ns,/ies,/i||i = (I) + (II) + (HI) . 

To bound the numerator of Ts, we bound each of these three terms. (I) and (II) are simply quadratic 
functions of multivariate normal random vectors and we control their deviations using Lemma 1. 
In contrast, (III) is more intricate and we use an ad-hoc method. In order to structure the proof, 
we state four lemmas needed in our calculations. We provide proofs of the lemmas further down. 
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Lemma 10. Under condition (29), there exists a numerical constant C > 0 such that 


|S*III'#II 


’’1™- 2(1 +"m|i)'"* 


^ > 1 - exp -C 


m\ 


(56) 


Lemma 11. For any t > 0, 

(II) > -2<r*||,^|b^2|S'‘|||rW||(2 + I|,^||i)( - 12[||rW|| V (1 + ||,^||i)<tJ](} > 1 - e-'pT) 

P{(II) > -2^2<t*V(|Ni.I + l)|iog(|Nk| + l) + (l||Fs, 1 ,^ 112 } > I-e-'{.58) 

Recall that 'jj = {T{4>))oj denotes the covariance between Yq and Yj. 

Lemma 12. Denote by the covariance matrix of {Yi, i G N/i). For any t < 


A"“ < i + 4 ||r 


with probability larger than 1 — 2e *. Also, for any t > 1, 


■‘WII^IiT72('^+‘°S<|Nftl)) (59) 


|rN,(0)-'''"FjAes.»l 




> |N,,| - C f |N/,|||,/.||i + ||r 


-1 


2 + 






with probability larger than 1 — 2e *. 

To bound the denominator of T5, we start from the inequality 

ll^S'./illi “ \\^s,hys,h\\2 = I|es,/i|l2 “ I|ns',/ie5,/i|l2 ^ Iks./illi 

and then use the following result. 

Lemma 13. Under condition (29), we have 

T < al\S'^\{l + |N,|-i/2)| > 1 _ 


r 

(60) 


(61) 


With these lemmas in hand, we divide the analysis into two cases depending on the value of 
\\4>\\2- For small \\(j)\\ 2 , the operator norm of the covariance operator r(i;i)) remains bounded, which 
simplifies some deviation inequalities. For large ||(/>||2, we are only able to get looser bounds which 
are nevertheless sufficient as in that case \\ 4>\\2 is far above the detection threshold. 

Case 1: ||(/>||2 < (4|N/i|)“^. This implies that ||0||i <1/2 and also that ||r((/)|| < 2a‘^ by Lemma 5. 
Combining (56) and (57) together with the inequality 2xy < we derive that for any f > 0, 


(I) + (11) 


a 


2 — 


> C 15' 


~ih 


— t 


(62) 


with probability larger than 1 — e * — exp . Turning to the third term, we have 


(III) 


a 


2 — 


> A” 




-1 
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Let a > 0 be a positive constant whose value we determine later. For any t > 0, with probability 
larger than 1 — 4e“*, we have 


(III) 


a 


|N;,|-C'(|N;,|||,/.||i + 


.2 — 


2 , l%|«/2+|%|2(E,vo7?)^ .2 


i + 4 ||r 


(v/t + log(|N,,|)) 


> 


|N;,|-C(|N;,|3/2||</,||2+||0||2 + 


|N.|V^ + |N,P(E,^o7|) \ ^2 _ IN.I 






(yi + lo, 


> |N;,|-C 




> |N,,|-a|5'‘|||,/.||i-C|a 


> 


iNhl - a\S’^\U\\l - (1 + + l)t^) 


VW\J 

-i|N ,|^4 , , |N;,|log(|N,|) , ^4,' 


Here in the hrst line, we used Lemma 12. In the second line, we used the fact that {l — y)/{\ + x) > 
1 — X — y for all X,y > 0, ||(/>||i < y^|N/)J ||(?!)||2 by the Cauchy-Schwarz inequality, and ||r((/))|| V 
||r^^((/>)|| < 2. In the third line, we applied the inequality — ^ll^lll + 16||(/>||2 < 20, which 

is a consequence of ||(()||i < 1/2 and Lemma 6. The last line is a consequence of Condition (29). 
Then, we take a = (7/2 with C as in (62) and apply Lemma 13 to control the denominator of Tg. 
This leads to 


// I I 

P jr^ > (7|5^|||(/||i + iNfel - (7'v^(l V t^)| > 1 - 4e-‘ - 2e"^'W . 

Taking t = log(8//3) and letting C 2 be small enough in (30), we get 

p {r^ > C\S%ml + |N;,| - C"v^(l + log^(/3-'))} > 1 - /3 , 

proving (32) in Case 1. 

Case 2: \\(t>\\‘^ > (4|N/i|)“^. This condition entails 

i + uwi ~ ■ 

Since the term (III) is non-negative, we can start from the lower bound Zs > (I) + (II). We derive 
from Lemma 10 and the above inequality that 


(I) > |5 



> 1 — exp 



(63) 


Taking t = (7|5^|/|N/ip in (58) for a constant C sufficiently small, and using Condition (29), we 
get that (II) > — 3\/C'<7 <^\/|*S'^|/|N/j|-^/(T) with probability at least 1 — e“*. Also, \\4>\\2 > (4|N/i|)“^ 
implies that the right-hand side exceeds —2(1) when the event in (63) holds and C is small enough. 
Hence, we get 


P (I) + (H) > |5' 




h \"</< 


8v^ 


>1 — 2 exp 
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Finally, we combine this bound with (61) and the condition \\4>\\2 > (4|N/i|) to get 


where we used the condition on j3. In view of Condition (29), we have proved (32). This concludes 
the proof of Theorem 2. It remains to prove the auxiliary lemmas. 

7.8.1 Proof of Lemma 10 

Recall the definition of Si in (51). Let F5. denote the matrix with row vectors Fj,j G Si. We have 

(I) = ||F5,,.0||i = ^ \\FsMl- 

ieNhU{0} 


For any u G , 


Var 

E 

= Var 

W 


> ll^llill</>lliA“”(r(<^)) 


jeSi 


J&Si 




since the indices {v + j : v G N/i,j G Si) are all distinct. Since A™™(r((/))) > 


i+ll 




by Lemma 5, 


aT 


F5JII is stochastically lower bounded by a distribution with IS'jl degrees of freedom. 


By Lemma 1 and the union bound, we have that for any t > 0, 


(I) > E |5.|-2V|5*|[log(|N,| + l)+t] 

*e%u{0} 

> (is'*! - 2y'|S'‘|(|Nt| + l)|log(|Nft| + 1) + (]) 


with probability larger than 1 — e *. Finally we set t = 32(1 ^ and use Condition (29) to 
conclude. 


7.8.2 Proof of Lemma 11 

We first prove (57). Denote by the covariance matrix of the random vector (ej^,of 
size 2|S'^|. Let R be the block matrix defined by 


R = 




" 1 /2 1 /2 

Letting Z be a standard Gaussian vector of size 2|S'^|, we have 2c/)^F~^f^es^h ~ • 

From Lemma 1 we get that for all t > 0, with probability at least 1 — e“*, 















where we used the fact that = E[(/)"'"Fg ^£5^/1] 

bound the Frobenius norm above, we start from the identity 




= Var[2</.TFJ_;,6S,/.] = E (2</.' F^^^es,hf 


0 and that ||R|| = 1. In order to 


4 E[eieji4>~^Fi){4>'^Fj)] , 


with e* being the rth component of es^h- For i = j, the expectation of the right-hand side is 
cr^E[(</>"'~Fj)^], while if the distance between i and j is larger than h, then Cj and {ej,Fi,Fj) are 
independent and the expectation of the right-hand side is zero. If 1 < \i — j\ < h, then we use 
Isserlis’ theorem, together with the fact that e* _L F*, to obtain 

\E[eiej{^'^Fi){4>'^Fj)]\ = \E[eiej]E[{cP^ Fi){^'^ F^)] +E[eicl)^ Fj]E[ej4>^ 

< al\<P,.j\E[{cp'^Fif] + cj,]_,al. 

Putting all the terms together, we obtain 

< 4<r2|S'‘|||0|||||r(4,)||(2 + |Wi), 

using the fact that ||r((/))|| > 1. 

Turning to denote r((/))'^ the covariance of the process (cj, i G Z'^). By Lemma 7, 

{T{(f>Y)ij = [—(pi-j + llj=j] cr^, and it follows that ||r((/))^|| < (1 -|- Then, for all vectors 

u,v € 


Var j ^ Ui4>^Fi ^ ViCi j = Var I ^ mYi + ^ (t'i - Ui)ei 
ies^ ) \i&s^ ies^ 


< 2 Var I ^ UiYi | 2 Var | Ui)ei 

\ies'^ / \i&S^ 

< 2\\u\\l\\T{(P)\\+2\\u-v\\l\\T{(Py\\ 

< 6(||n||2 + ||^||2)[||r(<?i)||v||r(</>ri|] . 

Consequently, ||Il0,5|| < 6[||r((/.)|| V ||r((/))"||] < 6[||r((/.)|| V (1 -h \\(P\\i)(tI]. 

We conclude that (57) holds by virtue of the two bounds we obtained for the two terms in (64). 


Turning to (58), we decompose (II) into 2 X)i6 N^u{o}For any ji / j2 G Si, \j 1 -j 2 \oc > 
2h + 1 and therefore e^j is independent of {Yj^^^, u G N/i U {0}). Since and Fj^cp are linear 
combinations of this collection, we conclude that T (eT , (p~^Fjy. Consequently, follows a 

standard normal distribution and is independent of F^.c/i. By conditioning on Fs^(p and applying 
a standard Gaussian concentration inequality, we get 


IP’{l</’"^Fj,esJ < ayiFs^cphV^] <e ^ , 


for any t > 0. We then take a union bound over all i G Nh U {0}. For any t > 0, 


(II) > -2v/2cT^Vlog(|N,,| + l) + t J] ||<?^^FsJ|2 

ieNhU{0} 

> —2-v/2o'0Y^log(|N/i| -|- 1) -|- ty^lN/il -|- l||F5^/i(/i||2 , 

with probability larger than 1 — e“^ 
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7.8.3 Proof of Lemma 12 

Proof of (59). Fix {vi,V 2 ) G N/j and consider the random variable 

= yJkYs = V^Ts{<py/^IlTy\cj^)V , 

ieSf^ 

which constitutes a definition for the symmetric matrix R, and V ~ AA(0,Is). Observe that 
\mi = 1*51 and ||R|| < 1 as the li norm of each row of R is smaller than one. We derive 
from Lemma 1, and the fact that ||F 5 ( 0 )^/^Rry^((/i)|||. < ||R|||,||Fy^((/))||^ < |5^|||F((/))|p and 
||r 5 ((/>)^/ 2 RFy 2 (^)|| ^ ||R||||r 5 (,^)|| < ||F((/>)||, that for any t > 0, 

p||(Fj,F5,/^).„„,-|5"|7.„„,| <2||r(0)||Y^ + 2||F((/))||t| <2e-* . 


Then we bound the £2 operator norm of |5^| ^Fg^F 5 /i — Ff^^(i?i)) by its £i operator norm and 
combine the above deviation inequality with a union bound over all (^ 1 ,^ 2 ) G N/j. Thus, for any 

t<\s% 




15^1 


-YnM 


< sup 


(Y s,hYs,h)vi,V2 






< 2||r 




+ t 


\Sh\l/2 

|N„| 






^ 4||r(0)||^(v/t + io, 


with probability larger than 1 — 2e *. Hence, under this event. 


A”- < 1 + 4||F 


m 

\Sh\l/2 


t + lo; 


since ||Ff^^((?!)) ^|| < ||F ^((/>)||. This concludes the proof of (59). 

Proof of (60). Turning to the second deviation bound, we use the following decomposition 




1/2 


Ys,h(^S,h 


l|2 

II 2 




A + B , 


with Cj being the fth entry of £ 57 . Since both A and B are Gaussian chaos variables of order 4, we 
apply Lemma 2 to control their deviations. For any t > 0, 

P {^ + R > E[^ + R] - C (Vari/ 2 (^) ^ YavY^(B)^ < 2e"^ , (65) 

using the fact that Var^/^(^ + R) < Var^/^(^) + Var^/^(R). Thus, it suffices to compute the 
expectation and variance of A and R. 

First, we have E[^] = |5^||N/i|(Ty by independence of and R,, and from this we get 


Var(74) 


E 

i.jeS''* 


E 
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If I* ~ j|oo < we may use the Cauchy-Schwarz inequality to get 

= 34|N/,|(|N;,|+2) , 

again by independence of and If \i — j|oo > h, then e* is independent of 

(Fj, Fj, €j) and ej is independent of (Fj, Fj, e*), so we get 


\Aj\ <E et\\Tf,^{(P)-^/^Fi 


A, 

a 


= E 


||rM,(0)-'/2F,||i||rN,(0)-'/2F 


2 
ill2 


— ^ ^ (rN(,('/’) )fi,V2(I^Nh(0) ')vz,V4,\lvi—V2'yv3—ViF^i-\-v-i—j—V2.'yi+v2—j—V4,F^i+vi—j—Vi'yi+v3—j—V'2\ 

V\,V2,VZ,Vi&ih 

-|%|2 

= 'y ^ (rNh('/’) )fi,V2(I^Nh(0) ')vz,V4,\li+v-i-j—V2'yi+v2,—j—V4,F^i+v-i—j—Vi'yi+v3—j—V2\ 


■yi,'y2,'i’3,'i’4SNh 


where we apply Isserlis’ theorem in the second line and use the definition of rN^((/>) in the last line. 
By symmetry, we get 



\li+v-i-j-vzli+v2-j-V4, 

Vl,V2,VZ,Vi&^h 

< 2||rM,(</.)-'f|N;,| it,+.,-.2 

Vl,V2&^h 

< 2||r-H</>)f |N;,|2 , 

v&izh 


using the Cauchy-Schwarz inequality in the second line. Here ||A||oo denotes the supremum norm 
of the entries of A. Then, summing over all j lying at a distance larger than h from i, 

E ^ s 2iir--(0)ifflN,.r^ E E 

,\j—i\ac,>h ‘f’ j&S^,\j—i\oo>h v&Nzh 

< 2‘'+l||r-l(</>)f|N;,|3 J] 7|. 

ieZ'*\{o} 

Putting the terms together, we conclude that 

Var(A)< 415^^1 |N;,|M 6+ 2'^+^ J] yl). (66) 

V iezFfo} / 

Next we bound the first two moments of B. Consider (f,j) G such that \i — j|oo > h. 
Then E [eiejFjFi\ = 0 by independence of e* with the other variables in the expectation. 
Suppose now that \i — j|oo < h. By Isserlis’ theorem, and the independence of e* and Fj, as well as 
€j and Fj, and symmetry, to get 


E 


ejCjFl" Tr 




= E [ejFj] Tn^ ((/>) ^ E [FjCj] -b E [cjej] E 
> l|rNj</>)“i -al\ 4 >i-j\ 




-1 
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using the Cauchy-Schwarz inequality and Lemma 7. As a consequence, 


E[B| > -»||S'‘|||^||il|r-l{,#,)|| -4|S'>||Nft|||^||. . (67) 

Turning to the variance, we obtain 

Var(i?) < E[B^] = E E , 


where 

^l,*2,*3,*4 -^*2-^73 (*?^) ■ 

Fix ii. If one index among {ii,i 2 ,i 3 ,H) lies at a distance larger than h from the three others, then 
the expectation of Vi^^j2,73,*4 equal to zero. If one index lies within distance h of ii and the two 
remaining indices lie within distance 3/i of ii, we use the Cauchy-Schwarz inequality to get 




*2,*3,*4J — 


< E 




< E|£*(FTrK.{,#,)-‘F,)'"] = 34 |n»|(|n»| + 2 ). 


k=l 


Finally, if say \ii — i 2 \oo <h and jis — i4|oo < h and \ik — i(,\> h for k = 1,2 and £ = 3,4, then we 
use again Isserlis’ theorem and simplify the terms to get 

®'[^l,*2,*3,*4] ~ ®[e7ie72] ^'[^ 73 ^* 4 ] ^'[-^71 -^*2-^73 ^N/i (‘^) -^ 74 ] 

+ E[e.2i^T]rNj</.)-iE[F,2e.jE[e.,i^;]rNj</>)-'E[F,,e.3] 

where we used again Lemma 7 to control the terms involving e’s and the Cauchy-Schwarz inequality 
to bound the term in [Fi^, k = 1,..., 4). Putting all the terms together, we conclude that 

Var(B) < + \S’^\^al\\m\\T-\cP)f) , (68) 

since < Var[yj] = 1. 

Plugging in the bounds that we obtained for the moments of A and B in (65), we conclude the 
proof of (60). 


7.8.4 Proof of Lemma 13 

Recall the definition of Si in (51). We decompose = X^igN^ufo} II2 note that 

Iksilli ~ ^PP^y^S second deviation bound of Lemma 1 together with a union bound, 

we obtain that for any f > 0, 

lleSjfelli — Ey (\^i\ + log(|N/i| -|- 1) -|- t -|- 2f -|- 2 log(|N/i| -|- 1)^ 

iSNhUfO} 

— ^l‘S'^1 + 2 y^|5'^|(|N/i| -|- 1) log(|Nft| -|- 1) -|- f) -|- 2|N/i| (t + log(|N/i| -|-1))^ , 

with probability larger 1 — e“h Relying on Condition (29), we derive that 

IP {II4./7II < ^^|S"|(1 + |N,|-V2)} > 1 - exp > 

for a numerical constant C > 0 small enough. 
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7.9 Proof of Corollary 1 


It is well known—see, e.g., Lauritzen (1996)— that any AR/i process is also a Gaussian Markov 
random field with neighborhood radius h (and vice-versa). Denote the innovation variance of 
an AR/i(^/;) process. The bijection between the parameterizations and (</>, u^) is given by 

the following equations 


4^—i — 4^i 


a 


2 

0 


- T!k=i+i V’fcV'fc- 


1 + 


1 + 


for f = 1,..., h , 


(69) 

(70) 


This correspondence is maintained below. 

Lower bound. In this proof, C is a positive constant that may vary from line to line. It follows 
from the above equations that 

2.r. \\nl + hU\\\ 

i + liv-lli ■ 


Consider any r < 1/h. In that case, if ||i?i||2 > r then the inequality above implies that ||'!/’||2 > 
Cr, and as a consequence, < ^cs{hCr)' Therefore, since (7) and our condition on h 

together imply that r < 1/h eventually, it suffices to prove that —>• 1- For that, we 

apply Corollary 4. Condition (34) there is satisfied eventually under our assumptions ((7) and our 
condition on h). Consequently, we have R/, 1 as soon as (35) holds, which is the case when 

(7) holds. 


Upper bound. It follows from (70) and the inequality < 1 that 



> 


1 + 11^111 


Denoting Un ■= log(n)//c + ^/hlog(n)/ k , observe as above that Un ^ 1/h hy our assumption on h. 

Assume that H'i/’Hl > for some > Un- If ||</>||i < 1/2, it follows from the inequality 
1 — cr^ < ||(/||i/(l — ||'?^||i) < 2||i/)||| (Lemma 5) that ||(/||| > r^/4. And if ||i?!)||i > 1/2, then 
\\4>\\2 > {8h)~^ by the Cauchy-Schwarz inequality. Thus, when < 1/h, we have \\4>\\2 > ^^/8, and 
this implies 

RcMh,r)if) < Rc,0{h,r/V8)(f) ’ f' 

When > 1/h, we simply use a monotonicity argument 


Rc,S(h,r)if) < ^c,5(/^,/^-l/2 )(/) < RcMh,i/VM)if) ’ any test /. 
The result then follows from Theorem 2. 
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